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Abstract 

This paper considers the recovery of a low-rank matrix from an observed version that simultaneously contains 
both (a) erasures: most entries are not observed, and (b) errors: values at a constant fraction of (unknown) locations 
are arbitrarily corrupted. We provide a new unified performance guarantee on when the natural convex relaxation 
of minimizing rank plus support succeeds in exact recovery. Our result allows for the simultaneous presence of 
random and deterministic components in both the error and erasure patterns. On the one hand, corollaries obtained 
by specializing this one single result in different ways recover (up to poly-log factors) all the existing works 
in matrix completion, and sparse and low-rank matrix recovery. On the other hand, our results also provide the 
first guarantees for (a) recovery when we observe a vanishing fraction of entries of a corrupted matrix, and (b) 
deterministic matrix completion. 

I. Introduction 

Low-rank matrices play a central role in large-scale data analysis and dimensionality reduction. They 
arise in a variety of application areas, among them Principal Component Analysis (PCA), Multi-dimensional 
scaling (MDS), Spectral Clustering and related methods, ranking and collaborative filtering, etc. In all these 
problems, low-rank structure is used to either approximate a general matrix, or to correct for corrupted 
or missing data. 

This paper considers the recovery of a low -rank matrix in the simultaneous presence of (a) erasures: 
most elements are not observed, and (b): errors: among the ones that are observed, a significant fraction at 
unknown locations are grossly/maliciously corrupted. It is now well recognized that the standard, popular 
approach to low-rank matrix recovery using SVD as a first step fails spectacularly in this setting [[H. 
Low-rank matrix completion, which considers only random erasures ([|2l, [O) will also fail with even 
just a few maliciously corrupted entries. In light of this, several recent works have studied an alternate 
approach based on the natural convex relaxation of minimizing rank plus support. One approach 
dSl provides deterministic/worst case guarantees for the fully observed setting (i.e. only errors). Another 
avenue |l6l, Q provides probabilistic guarantees for the case when the supports of the error and erasure 
patterns are chosen uniformly at random. Our work provides (often order-wise) stronger guarantees on 
the performance of this convex formulation, as compared to all of these papers. 

We present one main result, and two other theorems. Our main result. Theorem [T] is a unified perfor- 
mance guarantee that allows for the simultaneous presence of both errors and erasures, and deterministic 
and random support patterns for each. In order/scaling terms, this single result recovers as corollaries 
all the existing results on low -rank matrix completion [2], [3], worst-case error patterns [4J, and random 
error and erasure patterns [[6l, [|3 up to logarithm factors; we provide detailed comparisons in Section 
[n} More significantly, our result goes beyond the existing literature by providing the first guarantees for 
random support patterns for the case when the fraction of entries observed vanishes as n (the size of the 
matrix) grows - an important regime in many applications, including collaborative filtering. In particular, 
we show that exact recovery is possible with as few as 6(rapolylog(n)) observed entries, even when a 
constant fraction of these entries are errors. 
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Theorem [2] is also a unified guarantee, but with the additional assumption that the signs of the error 
matrix are equally likely to be positive or negative. We are now able to show that it is possible to recover 
the low-rank matrix even when almost all entries are corrupted. Again, our results go beyond the existing 
work [6] on this case, because we allow for a vanishing fraction of observations. 

Theorem [3] concentrates on the deterministic/worst-case analysis, providing the first guarantees when 
there are both errors and erasures. Its specialization to the erasures-only case provides the first deterministic 
guarantees for low -rank matrix completion (where existing work [|2l, ||3l has concentrated on randomly 
located observations). Specialization to the errors-only case provides an order improvement over the 
previous deterministic results in [4J, and matches the scaling of [5 J but with a simpler proof. 

Besides improving on known guarantees, all our results involve several technical innovations beyond 
existing proofs. Several of these innovations may be of interest in their own right, for other related 
high-dimensional problems. 



IL Main Contributions 



A. Setup 

The problem: Suppose matrix C E ]R"i><"2 jg ^j^g Qf underlying low-rank matrix B* G ]R"i^"2 
and a sparse "errors" matrix A* G IR"!^"^. Neither the number, locations or values of the non-zero entries 
of A* are known a priori; indeed by "sparse" we just mean that A* has at least a constant fraction of 
its entries being - it is allowed to have a significant fraction of its entries being non-zero as well. We 
consider the following problem: suppose we only observe a subset $ C [ni] x [^2] of the entries of C; 
the remaining entries are erased. When and how can we exactly recover B* (and, by simple implication, 
the entries of A* that are in $)? 



The Algorithm: In this paper we are interested in the performance of the following convex program 



{A,B) 



arcr mm 

A,B 
S.t. 



7PII1 + 

{A + B) = P$ (C) 



(1) 



where the notation is that for any matrix M, ||M||* = the nuclear norm, defined to be the 

sum of the singular values of the matrix, ||M||i = ^ \aij\ is the elementwise £1 norm, and P$(M) is 
the matrix obtained by setting the entries of M that are outside the observed set $ to zero. Intuitively, the 
nuclear norm acts as a convex surrogate for the rank of a matrix [8], and the ii norm as a convex surrogate 
for its sparsity. Here 7 is a parameter that trades off between these two elements of the cost function, and 
our results below specify how it should be chosen. As noted earlier, this program has appeared previously 
in 0, m. 



Incoherence: We are interested in characterizing when the optimum of ([T]) recovers the underlying 
(observed) truth, i.e., when (7^$ (A), 5) = (P$ (A*) ,B*). Clearly, not all low -rank matrices B* can be 
recovered exactly; in particular, if B* is both low-rank and sparse, it would be impossible to unambiguously 
identify it from an added sparse matrix. To prevent such a scenario, we follow the approach taken in the 
recent work flU, [Q, S, [|9l and define incoherence parameters for B* . Suppose the matrix B* with 
rank r < min(ni,n2) has singular value decomposition UTjV^ , where U G M"^^^, V G ]R"2xr 



S G 



". We say a given matrix B* is /i-incoherent for some ji G 
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where, e/s are standard basis vectors with proper length, and || ■ || represents the 2-norm of the vector. 
Notice that all our results in the following subsections only depend on the product of ji and r. 



B. Unified Guarantee 

Our first main result is a unified guarantee that allows for the simultaneous presence of random and 
adversarial patterns, for both errors and erasures. As mentioned in the introduction, this recovers all 
existing results in matrix completion, and sparse and low-rank matrix decomposition, up to constants or 
log factors. We now define three bounding quantities: pq, t and d. 

Let $d be any (i.e. deterministic) set of observed entries, and additionally let $r be a randomly chosen 
set such that each entry is in $i with probability at least po. Thus, the overall set of observed entries is 
$ = $r n $d, the intersection of the two sets. Let = U f^a be the support of A*, again composed of 
the union of a deterministic component Vl^i, and a random component generated by having each entry 
be in Vli- independently with probability at most r. Finally, consider the union $^ U Vld of all deterministic 
errors and erasures, and let d be an upper bound on the maximum number of entries this set has in any 
row, or in any column. 

Theorem 1 (Unified Guarantee). Set n = minimi, r;,2}- There exist universal constants C, pr, Ps and pd - 
each independent of n, p and r - such that, with probability greater than 1 — Cn^^^, the unique optimal 
solution of (fTI) with tradeoff parameter 7 = — . ^ is equal to (V^iA*), B*) provided that 

pr log^ n 

n 

n pI 



Po 


> 


Pr 


T 


< 


ps 


d 


< 


Pd 



pr log n 

Remark, (a) The conclusion of the theorem holds for a range of values of 7. We have chosen one of these 
valid values, (b) Note that the above theorem treats errors and erasures differently. Treating erasures as 
errors by filling missing entries with random ±1 and applying Theorem pleads to a weaker result, in 

particular, po = Q (^\/^^^^. 

Comparison with previous work. Recovery from deterministic errors was first studied in [ Hi, [TOl . 

which stipulate d = O i . / — ] . Our theorem improves this bound to d = O ( — r^^-^ ) . In section H-D we 

\ V / V s " / 

provide a more refined analysis for the deterministic case, which gives d = O (^^j- As this manuscript 

was being prepared, we learned of an independent investigation of the deterministic case [5], which gives 

similar guarantees. Our results also handle the case of partial observations, which has not been discussed 

before Ml, Ml, llSl- 

Randomly located errors and erasures have been studied in [TJ- Their guarantees require that r = 0(1), 
and Po = r2(l). Our theorem provides stronger results, allowing po to be vanishingly small, in particular, 
e (^MTi^ j when there is no additional deterministic component (i.e. d = 0). After the publication of the 
conference version of this paper, we learned about [fTTll . They also deal with random errors and erasures, 
but under a different observation model (sampling with replacement), and have scaling results comparable 
to ours. 

Previous work in low-rank matrix completion deals with the case when there are no errors or de- 
terministic erasures (i.e., d,T = 0). For this problem, our theorem matches the best existing bound 
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Po = O ^ ^'''"^ [[31, [fT2l| up to logarithm factors. Our theorem also provides the first guarantee for 
deterministic matrix completion under potentially adversarial erasures. 

One prominent feature of our guarantees is that we allow adversarial and random erasures/errors to exist 
simultaneously. To the best of our knowledge, this is the first such result in low-rank matrix recovery /robust 
PCA. 

C. Improved Guarantee for Errors with Random Sign 

If we further assume that the errors in the entries in i7i.\fid have random signs, then one can recover 
from an overwhelming fraction of corruptions. 

Theorem 2 (Improved Guarantee for Errors with Random Sign). Under the same setup of Theorem [7] 
further assume that the signs of A* in Qr\^d (^f^ symmetric ±1 Bernoulli random variables independent 
of all others. Then there exist absolute constants C, p,. and pd independent of n, p and r such that, with 
probability at least 1 — Cn~^°, the unique optimal solution of ([T]) with tradeoff parameter 7 

is equal to {Vq,{A*), B*) provided that 

Po(l-^) > Pr 

n 

d < p,^ 



32A/po{a!+l)n 



pr log n 

Remark. Note that r may be arbitrary close to 1 for large n. One interesting observation is that Pq 
can approach zero faster than 1 — r; this agrees with the intuition that correcting erasures with known 
locations is easier than correcting errors with unknown locations. 

Comparison with previous work Dense errors with random locations and signs were considered in 
[l6l. They show that r can be a constant arbitrarily close to 1 provided that all entries are observed and 
n is sufficiently large. Our theorem provides stronger results by again requiring only a vanishingly small 
fraction of entries to be observed and in particular po = Q " j . Moreover, Theorem |2j gives explicit 

scaling between r and n as r = O ^ — \/ ^°n ' ^^^^ independent of the usually unknown quantity 
r. In contrast, [161 requires r < f(n) for some unknown function /(■) and uses a r-dependent 7. 

D. Improved Deterministic Guarantee 

Our second main result deals with the case where the errors and erasures are arbitrary. As discussed in Pl, 
for exact recovery, the error matrix A* needs to be not only sparse but also "spread out", i.e. to not have any 
row or column with too many non-zero entries. The same holds for unobserved entries. Correspondingly, 
we require the following: (i) there are at most d errors and erasures on each row/column, and, (ii) 
\\M\\ < r]d\\M\\^ for any matrix M that is supported on the set of corrupted entries and unobserved 
entries; here ||M*|| = crmax(^*) is the largest singular value of M and ||M||oo = rnaxjj \Mij\ is the 
element- wise maximum magnitude of the elements of the matrix. Note that by flU Proposition 3], we can 

always take < 1. Also, let a = ./^ + ./^ + J ^''^ 

Theorem 3 (Improved Deterministic Guarantee). For tradeoff parameter 7 e 
suppose 

j fird 1^ ^ I mm{ni,n2) ^ j d ^^1 



nin2 ' Jjd y nin2 




min(ni,n2) \ ' V max(ni,n2) max(ni,n2) / 2 
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Then, the solution to the problem ([7]) is unique and equal to {V<^,{A*), B*). 

Remark, (a) Notice that we have y/d in the bound while jj^ has d in their bound. This i mproveme nt is 
achieved by a different construction of dual certificate presented in this paper (b) V^^J^^^-^li^ < \ 

(the condition provided for exact recovery in f^) is satisfied then the condition of Theorem^is satisfied 
as well. This shows that our result is an improvement to the result in [4] in the sense that this result 
guarantees the recovery of a larger set of matrices A* and B*. Moreover, this bound implies that n (for 
square matrices) should scale with dr, which is another improvement compared to the d'^r scaling in [4]. 
(c) We construct the dual certificate by the method of least squares (first used in /|2]/ in a different setting) 
with tighter bounding. This theorem provides the same scaling result for d, r and n as that in the recent 
manuscript [5]. However, our assumptions are closer to existing ones in matrix completion and sparse 
and low-rank decomposition papers 

III. Proof Theorem [Hand [2] 

In this section we prove our unified guarantees. The main roadmap is along the same lines of those in 
the low-rank matrix recovery literature [i2j|, dTj], [l9]|; it consists of providing a dual matrix Q that certifies 
the optimality of {V^{A*), B*) to the convex program ([T]). In spite of this high level similarity, challenges 
arise because of the denseness of erasures/errors as well as the simultaneous presence of deterministic and 
random components. This requires a number of innovative intermediate results and a new construction of 
the dual certificate Q. We will point out how our analysis departs from previous works when we construct 
the dual certificate in section IIII-Dl 

Before proceeding, we need to introduce some additional notation. Define the support of A* as i7 = 
: A* - 7^ 0}. Let V = be the set of entries that are observed and clean, then is the set 
of entries that are corrupted or unobserved. Also, let Fj. = be the set of random observed clean 

entries, and Fd the set of deterministic observed clean entries; so F = Fr fl Fd. The projections Vr, Vv^, 
Vr,, and Vr^ are defined similarly to V^. Set E* := (sgn(y4*)), where sgn(-) is the element-wise 
signum function. For an entry set ^^o^ we write VIq ~ Ber(p) if contains each entry with probability p, 
independent of all others; therefore ~ Ber(po), ~ Ber(r), and F^ ~ Ber(po(l — t))- We also define 
a sub-space T of the span of all matrices that share either the same column space or the same row space 
as B*: 

r = {UX^ + YV'^ : X G M"^^", F G M"i^"} . 
For any matrix M G IR"^^"^^ we can define its orthogonal projection to the space T as follows: 

Vr (M) = UU^M + MVV^ - UU^ MVV^ . 
We also define the projections onto T"*", the complement orthogonal space of T, as follows: 

(M) = M-VriM). 

In the sequel, we use C, C and C" to denote unspecified positive constants, which might differ from 
place to place; by with high probability we mean with probability at least 1 — Cminjni, ^2}^^°. For 
simplicity, we only prove the case of square matrices (rii = n2 = n). All the proofs extend to the general 
case by replacing n by min{ni,n2}. The proof has five steps. We elaborate each of these steps in the 
next five sub-sections. 

A. Step 1: Sign Pattern Derandomization 

Following 0, the first step is to observe that it suffices to prove Theorem [2} which assumes random 
signed errors in nr\fid- The guarantee under arbitrary signed errors in Theorem [T] follows automatically 
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from Theorem [2] using a derandomization and elimination argument. This is given in the following lemma, 
which is a straightforward generalization of [7, Theorem 2.2 and 2.3]. 

Lemma 1. Suppose B* obeys the conditions of Theorem [7] If the convex program ([!]) recovers B* with 
high probability in the model where Vtr ~ 5er(2r) and the signs of A* in VLr\VLci have random signs, then 
it also recovers B* with at least the same probability in the model where Qr ~ Berij) and the signs are 
arbitrarily fixed. 

The basic idea of the proof is that, as long as r is not too large, a fixed-signed error matrix Vn^\^^{A*) 
can be viewed as the trimmed version of a random signed Vci^\q^{A*) with half of its entries set to zero; 
moreover, successful recovery under A* is guaranteed by that under A* , as the latter is a harder problem. 
We refer the readers to [|71 Theorem 2.2 and 2.3] for the rigorous proof of this argument. Proceeding 
under the random-sign assumption makes it easier to construct the dual certificate Q. The next four steps 
are thus devoted to the proof of Theorem |2j 



B. Step 2: Invertibility under corruptions and erasures 

A necessary condition for exact recovery is that the set of uncorrupted and un-erased entries F = rrflFd 
should uniquely identify matrices in the set T, so we need to show that the operator VrVrVr is invertible 
on T. This step is quite standard in the literature of low-rank matrix completion and decomposition, but 
in our case requires a different proof. In fact, invertibility follows from the following stronger result. 

Lemma 2. Suppose is a set of indices obeying Qq r^Ber(p), and satisfies d < —. Then with high 
probability, we have 

\\p-'VrVnonr,Vr-Vr\\ < ^ 

provided p > C^^^. 

Invertibility follows from specializing f2o = Fr. The lemma is stated in terms of a generic entry set ^Iq 
because it is invoked again elsewhere. Notice that this lemma is a generalization of [2, Theorem 4.1], as 
VIq n Fd involves both random and deterministic components. The proof is new, utilizing the properties of 
both components, and is given in the appendix. 



C. Step 3: Sufficient Conditions for Optimality 

The next step is to use convex analysis to write down the first-order sub-gradient sufficient condition 
for {V<^{A*), B*) to be the unique solution to ([T]). This is given in the following lemma. Recall that we 
have defined E* := (sgn(A*)). 

Lemma 3. Suppose 7, p^, r and d satisfy the condition in Theorem |2] Then with high probability 
B*) is the unique solution to ([T]) if there is a dual certificate Q = '-/E* + W obeying 
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(a) 


\\VtW - {UV^ 


(b) 


Vr-W = 0. 


(c) 


\\VrW\\^ < I 


(d) 


\\Vr^W\\ < \ 


(e) 


hVr^E*\\ < ^ 



(2) 



Proof: Observe that the conditions in the lemma imply V^c(Q) = 0, H^rlQ) ~ ^^^Hi? — 
l^r^(<5)ll < i 'PniQ) = lE*, and ||PrlL < I- Consider another feasible solution + A2, B* + 
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Ai) with Ai ^ 0, A2 ^ 0, and P$(Ai + A2) = 0. Take Go e and Fq e T such that ||Go|| = 1, 
ll-^olloo = 1, (Co, Ai) = ||P7-±Ai||^ and (Fq, A2) = ||PrA2||;^; such Gq and Fq exist due to the duality 
between |M| and |M|, and that between 11 -IL and 11 -IL. We then have 

MM* MM' MMi MMoo 

115* + AilL + 7 \\V^{A*) + A2II1 - ||5*L - 7 

> (f/F^ + Go, Ai)+7(E* + Fo, A2) 

= {UV^ + G0-Q, A,) + {jE* +jFo-Q, A2) 

= {Go - Vr4Q) - {VriQ) - UV^) , A,) + (7F0 - Vr{Q), A,) 

> \\Vr^A,l (1 - \\Vr4Q)\\) - \\Vr{Q) - UV^\\^ \\VrA^\\^ + \\VrA,\\, (7 - l|Pr(g)IL) (3) 

> I llPr^Aill, - ^ ll^rAill^ + J ||PrA2||i ; 
2 \ n 2 



here we use the sub-gradients of || • ||* and || ■ ||i in the first inequality and Cauchy-Schwarz inequality in 
(|3]). We need to upper-bound ||P7-Ai||^. Notice that w.h.p. 

WVvVtA^I 
= (PrAi, VrVrVrAi) 

= (PrAi, VrVrVrAi - po(l - r)PrAi + Po(l - r)PrAi) 
> po(l - r) llPrAill^ - ^po(l - r) WVtA^I 

= \po{l- r)\\VT A,\\l- 



here in the inequality we use Lemma [2] with VLq = and p = Pq{1 — t) . \t follows that 

||PrA2||i > ||PrA2||p= ||PrAi||^ 
= \\VtVt Ai + VtVt^ Ai\\p 
> llPrPrAill^- llPrPr^Aill^ 

^ y^°^||PrA,||,-||P^xA,||, 

- /f ll^rAill^-llPr-AiL, 
where the last inequality holds under the assumptions in Theorem [2j Substituting back to (|3]), we obtain 

115* + AiL + 7 \\V^{A*) + A2II1 - II51L - 7 

> ll^r.A,|L(i-|) + ||PpA2||,(|-| 

> 0, 

where we use 7 < L We claim that the above inequality is strict. Suppose it is not, then we must have 
Vf^Ai = VtA2 = 0. But under the assumptions in Theorem [2[ VrVrVr is invertible by Lemma |2] and 
thus n r = {0}, which contradicts Ai 7^ and A2 7^ 0. ■ 



D. Step 4: Construction of the Dual Certificate 

We need to show the existence a matrix W obeying the conditions in ([2]) in Lemma [3} We will construct 
W using a variation of the so-called Golfing Scheme [|3j [HI- Here we briefly explain the idea. Consider 
the left hand side of condition (a) in (|2]) as the "error" of approximating UV^ — 'jVrE* by VrW; we 
want the error to be small. First observe that the choice of W = UV^ — 'jVrE* satisfies (a) strictly 
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but violates (b). To enforce (b), one might consider sampling according to T, the set of observed clean 
entries, and define 

W, = (po(l - T))-'Vr {UV^ - iVrE*) . 

With the choice of W = Wi, (b) is satisfied, and one expects the error in (a) is also small because 
its expectation equals —VrVr^ (UV^ — •yVrE*), which is small as long as Vr^ is a contraction. This 
intuition is largely true except that the error is still not small enough. To correct this bias, it is natural 
to compensate by subtracting the remaining error from Wi, and then sample again. Indeed, if one sets 
W2 = Wi- (po(l - r)y^Vr (Pr^i - {UV^ - -fVrE*)), then W = W2 still satisfies (b), and the error 
in (a) becomes smaller. By repeating this "correct and sample" procedure, the error actually decreases 
geometrically fast. 

This is almost exactly how we are going to construct Q; the only modification is that for technical 
reasons we need to decompose the observed clean entry set T into independent batches and sample 
according to a different batch at each step. To this end, we think of Q'^ ~ Ber (1 — r) as Ui<k<ko 

n^''^ and 

~ Ber(]9o) as Lii<k<kQ^^''\ where the sets ^l^'^^ ~ Ber(gi) and ^^'^^ ~ Ber(g2) are independent; here 
ko is taken to be [41ogn], and qi,q2 obeys 1— r = l — (1 — gi)^" and po = 1 — (1 — g2)^°- Observe 
that qi > (1 — r)/A;o and q2 > Po/^o- One can verify that fij. and $1- have the same distribution as before. 
Define T^''^ = Q^''^ n <^^''\ which can be considered as the A;-th batch of (random) observed clean entries; 
we then have F^'') ~ Ber(g) with q := qiq2 > ^"''^2'"^'' > C ^^'^°^" , where C may become arbitrarily large 
by selecting pr sufficiently large. Define the operator TZj-ik) : M"^" 1— i- M"^" as 

7^^(.)(M) 4 g-iPr(^)nr,(M) = ^ g-iM,,,.(e,eJ), 

i,ier('=)nrd 

which is simply the (properly scaled) projection onto the A;-th batch of observed clean entries. The matrix 
W is then constructed as W = Wk^, where Wk^ is defined recursively by Wq := and 

Wk := Wk-i + 7^^(.) {UV^ - -fVrE* - VrW^-i) , for A; = 1, 2, ... , ko- 

The previous work [7J also applies Golfing Scheme, but only to the part of the dual certificate that 
involves UV^; for the part that involves E*, they use the method of least squares. We utilize Golfing 
Scheme for both parts of the certificate. Difficulties arise due to the dependence between E* and F^'^^'s, 
and a new analysis is needed for the validation of the certificate. This crucial difference allows us to go 
beyond and handle a vanishing fraction of observations and/or clean entries. 



E. Step 5: Validity of the Dual Certificate 

It remains to show that Q satisfies all the constraints in the optimality condition ([2]) simultaneously. 
The equality (b) is immediate by the construction of Q and W. To prove the inequalities, one observes 
that if we denote the k-th step error as := UV^ — 'jVrE* — VrW^, then satisfies the following 
recursion 

Dk = UV^ --fVrE* -VrWk 

= {Vr - Vrnri.)Vr){UV^ - iVrE* - VrW^-i) 

= {VT-rrnr(k)VT)Dk-i, (4) 
and Who can be expressed as 

fco 

Wko = J2nrwDk.i. (5) 

k=l 

We are now ready to prove that W = Wk^ satisfies the four inequalities in (|2]) under our assumptions. 
The proof uses Lemmas 11 15] in the Appendix. 

Inequality (a): Boundiiig||7^rW^ - {UV^ - 7Pr^*)||^ 
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Thanks to (Q, we have the following geometric convergence 

= \\{Vr - Pr^r(^o)^r) ■ ■ ■ (Pr - Pr^r(i)^r)^o|li. 
< \f[\\Vr-Vrnri.,Vr\\] \\UV^ --fVrE*\\^ 



\k=l 



< e-'''^ {\\UV^\\^ + j\\VrE*\\p) 

(a) (m) ^ 

< n (n + 7n) < 



n 



here (i) uses Lemma |2| (ii) uses ||P7-i?||p < \\E\\p < n, and (iii) is due to our choice of 7. This proves 
inequality (a) in ([2]). 

Inequality (c): Bounding ||PrW^||oo 

We write 

k 

Y[(Vt - VTn^i^)VT) = {Vt - Vrn^ik^Vr) ■■■{Vt- Vrn^wTT) 

i=l 

where the order of multiplication is important. Then we have 

\\VrW\\^ = mj^ 



(ii) 



< 



ko 

^||7^^(fc)Dfc_l| 


k=l 




ko 


k-l 






k=i 


1=1 


ko 


k-l 






k=l 


i=l 

ko k 


+q 


-'E ] 

k=l i 



ko 



k=l 



'k-l\ 



k-l 



i=l 



here (i) uses ([5]) and (ii) uses Q. We bound the above two terms separately. 
The first term is bounded as 



fco 



< 



k-l 



i=l 



k=l 

-'EG 

k=l ^ 

< C /° 

" Po(l - r) 

(ii) U2 

< C- 



k-l 



\UV^ ~^VrVn,E* 



Po(l-r) 



UV^ -^VrVn,E*\ 
fir 

+ 7a 




(Hi) I 

< -7; 
- 4 



(6) 



(7) 
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Here (i) uses the second part of Lemma 13 with ilo = T*^*^^ and 63 = |, as well as the fact that a < | 
under the assumptions of Theorem |2| (ii) uses the incoherence assumptions and Lemma 14 and (iii) holds 
under the assumptions of Theorem [2} 

For the second term, we can not use the above argument, because E* = P^(sgn(So)) is not independent 
of r*^*)'s and thus Lemma 13 does not apply. Instead, we need to utilize the random signs of E* : = 



P$ (sgn(y4*)) (a similar argument appeared in [|7l). Consider the k-th term in the sum. We have 



-1 



k-l 



7g max 

a,b 



75 max 

a,b 



HiVr - Vrnri.,Vr){iVrVn,.\n,E*) 

k-l 

i=l 

k-l 

YliVr-Vrnn^-.^Vr) {caej) , V^n{nAn,) (sgn(A*)) 



ii=l 



here in the last equality we use the self-adjointness of the operators. Conditioned on $, and F^^^'s, 
'P<s>n{n,\ni) (sgn(v4*)) has i.i.d. symmetric ±1 entries, so Hoeffding's inequality gives. 



P I 7g"^ 



k-l 



< 2 exp 



< 2 exp 



i=l 



> ^|$,^],F(^)'s 



2t' 



\ 
( 



2t2 



2 



< 2 exp 



7V^nf=/ ||^r-Pr7^^(-oPr||'¥. 



(8) 



here the last inequality uses ||'Pr(eae^) ||^ < which follows from the incoherence assumptions. 
Conditioned on the event Gk ■= { \\Vr — 'Pr'^r{'=-')^rll <|, ^ = 1,...A; — l},we can integrate out the 
conditions in ([8]) and obtain 



< 2 exp 



I k-l 



*n(a\nd) 



(sgn(A*)) 



>t\G, 



7V^ (D 



By Lemma 



, k-l 



■yfir log n 



qn 



we know that the event holds with high probability. Choosing t = C (^) 
with G sufficiently large and using union bound (there is only polynomially many different (a, 6)), we 
conclude that 



fc-i 



\[{Vt - VTn^i.,Vr)iiVrVn,.\n,E*] 



i=l 



< G 



k-l 



^fir logn 
qn 



< 
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with high probability; here the second inequality holds because q > C ^'''"S" by our choice. Summing 
over k It follows that 



ko 



k=l 



k-1 



l[{Vr - Vrn^,.,Vr){lVrVnAn,E* 



i=l 



1 

< -7. 

- 4 ^ 



(9) 



Combing Q and (|9]) proves inequality (c) in ([2]). 
Inequality (d): Bounding UPT-iVTH 
We have 



ko 



k=l 



(n) 



(Hi) 
< 



< 



ko 




El 


Vr± (7^ 


k=l 




ko 




E 




k=l 




ko 




E 




k=l 


ko 

+E < 

k=l 



'k-l. 



k-l 

n 

i=l 
k-l 



i=l 

k-l 



i=l 



(10) 



here (i) uses ([5]), (ii) uses Dk G T, and (iii) uses (Q. We bound the above two terms separately. 
The first term is bounded as 



ko 

E 

k=l 



k-l 



(0 
< 


'i 


(ii) 




< 


2C 


(Hi) 




< 


2C 


(iv) 


1_ 


< 


0^ ' 



1=1 

nlogn \ 



k=l 



k-l 



i=l 



' n log n 



+ d] llUV'' - ^VrVn,E*\ 



I n log n 



+ d 



fir 




+ 7a 



(11) 



here (i) uses the second part of Lemma 12 with = T^''\ (ii) uses (|6]), (ii) uses the incoherence 
assumptions and Lemma [l4land (iv) holds under the assumption of Theorem [2j 

For the second term in (|10[), the above argument fails due to the dependence between VQ^\n^E* and 
r*^*)'s. Again we rely on the random signs of Vn^.\n^E* = P$n(a\!^d) ^S^^l^*)' situation is more 

complicated here as we need to use an e— net argument to bound the operator norms. 

The key idea is to observe that, though independence does not hold, conditional independence does 
- r(*)'s and E* are independent conditioned on Q. This is because supp(i?*) C is a random subset 
of the corrupted entries while F*^*) C are random subsets of the un-corrupted entries. To isolate this 



12 



independence, we telescope the operators in the second term in ( flQ] ). For k = 1, . . . ,kQ, define the operators 



7^T 



n 



Observe that V-j — Pr^rc-^^r = At + Sk, and T^rc^) ~ ^ = + %■ The reason for doing so is that, 
conditioned on ^l, TkS and Si's are independent of E*. Thus if a term only involves Tk and 5fc's (we call 



it a Type-1 term), it can be bounded in a similar way as the first term in ( [T0| ) using Lemma 12 and 13 



For the other terms that involve not only Tk and S^s. but also Ai's and/or B^s (dubbed Type-2 terms), we 
bound them using the random signs of E*. (It turns out if one bounds the Type-1 term using the random 
signs, the resulting bound is not strong enough, so we need to distinguish these two cases). 

Now for the details. Consider the k-th term in summands of the second term in ( fTO] ). Using the above 
definitions, we have 

fe-i 



(7^^(.) - X) n CPt - Pr7^^(oPr) (-7^r^a\n,^* 

i=l 

k-1 

{Bk + Tk) + S,){^VrVn^\n,E* 



i=l 



(12) 



We expand the product and sums in the above equation, which results in a sum of 2''=poly(n) terms since 
k < ko = 0{logn). Among them there is one Type-1 term 



TkSiS2 ■ ■ ■ Sk-iiiVrVnAn^E*), 

and 2*^ — 1 Type-2 terms, such as 

TkAiS2S3 ■ ■ ■ Ak-2Sk^i{lVrVn,\n,E*), 
BkSiA2S3 ■ ■ ■ Sk^2Ak-iiiVrVnAn,E*). 

We first bound the Type-1 term. Conditioned on fi, we have 

||7I.>Si>S2 • ■ ■ Sk-ii'jVr'Pn^\n,E* 



(13) 



n('=)nrj 



< 



(ii) 
< 



(Hi) 
< 




k-l 



n 



\ \qiQ2 



VrV, 



<i>(^)n(n(^)nrd) 



Po log n 



1 

16 



here in (i) we apply the first part of Lemma 



12 



with fin = and Tr 



part of Lemma 13 with fio = Tq = fi*^*-* fl Fd and £3 = \qi, (ii) uses Lemma 



fi^'') nFd, as well as the first 
and (iii) holds under 



the assumption of Theorem |2j 



15 



13 



We next bound the remaining 2^^ — 1 Type-2 terms. To this end, we first collect five useful inequalities. 
Because fi^*^ ~ Ber{qi), the second part of Lemma |TT with ^Iq = f]*^*) and ei = C ^''^'"^" gives that w.h.p. 

\\Vr-Vrn^(,)Vr\\ 



1 1 "^''^ 

< C. 



nqi 



n 



log n 



(14) 



The first part of Lemma 1 1 with VLq = fi*^^) and Fq = Fd shows that w.h.p. 

1 



VrBkW = 



< 



< 



1 



l^r^nwnrjl + ll^rll 



|^r^n(^)nr,^r|| + 1 



qi 



qi \\VrVr,Vr\\ + l<CJ-<C'J^ 

V gi V i 



/ log n 



(15) 



Similarly, we have w.h.p. 



< 



1 



qiq2 



'Pr'P$(fc)no(fc) VrVQ(k) 

qiq2 qi 

1 

+ 



Vr - -VrVnik) 
qi 



< cJ— + cJ-<c\ 
V qiq2 V qi 



log^ n 
Po(l-r) 



(16) 



fio = ^^'^ n Fo = Fd, ei = gives w.h.p. 



Applying the first part of Lemma 11 twice with (1) VLq = Fq = Fa, ei = C J and (2) 



< 



Sk 
1 

qi 



1 



qiq2 



I fir log n 
nqi 



/ iir log n 
nqiq2 



< C\ 



^r'P$(fc)n(n(''-)nr,,)'^r 



VtVv.Vt VtV< 

$(fc)nf7(fe))nrd Vt 

qiq2 

firlog^n ^ 1 



npo{l - r) 4 



(17) 



Finally, since $ fl (r2r\f2d) ^ ^ '^'r, we apply the first part of Lemma 1 1 with i7o = '^'r, Fq = [^] x [ 
and ei = ^ to obtain w.h.p. 



n\ 




< 
< 



Po 
2po 



—VtV^^Vt - Vt 
Po 



Po 



(18) 



Now consider one of the Type-2 terms 
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Let X* be the adjoint of X. The last five inequalities ([T4l)-([T8]) yield w.h.p. 

||'P'i'n(a\!^d)'^7"'^*|| = \\'P^n{n,\ni)VrAk-iSk-2---<SiVrTk\ 



/po(l -t) f'^\'' ^ I log^n 



log n \4: J y po(l - ^) 

< C'^o(^\^ . (19) 

It is not hard to check that this inequality also holds for the X's associated with other Type-2 terms, 
except for the term (Jlnw^x) {~1'^tE*), which is discussed later. We are ready to bound the operator 
norm of the Type-2 term using a standard e-net argument. Let be the unit sphere in M", and be 
an 1/2-net of of size at most 6". The definition and Lipschitz property of the operator norm gives 
that 

\\X{-iVTVn^n,E*)\\ 
= sup {xy", X{^VrVn,\n,E*)) 

< 4 sup (xyT, Xi^VrVnAn,E*)) 

x,yeN 

For a fixed pair {x,y) G N x N , we have 

(xy^, X{jVrVn,\n,E*)) 
= 7 {V^nmn,)VrX* (xy^) , sgn(5*)> 



We condition on the event that (19) holds. Because sgn(S'*) has i.i.d. symmetric ±1 entries, Hoeffding's 
inequality gives 

F (l{V^n(nAn,)VTX* (xy^) , sgn(5*)> > ^ 



4^k 

( 2-^ \ 

< 2exp ' 



< 2 exp 



< 2 exp 



2 ■ 

r^i 1 

^ ' 42fe 



1 ^ 1 

< 2exp(-C"r2) 

for some constant C that can be made large. This probability is exponentially small, so we can apply 
union bound over the 6" pairs (x, y) in the e-net N x N and conclude that w.h.p. 

For the exceptional term {TZqw^x) (^I'PtE*), a similar bound holds as follows. The proof can be found 
in the Appendix. 

Lemma 4. Under the assumption of Theorem |2] the following holds with high probability 

m^^^)^x){'lVrE*)\\<^. 
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Summing over all 2^' — 1 = poly(n) Type-2 terms and combining with the bound ([14]) for the Type-1 
term, it follows that the right hand side of ( fT2| ) is bounded by Summing over k = 1, 2, . . . , A;o bounds 
the second term in ( flQ] ) by |, which, together with the bound ( |lT] ) for the first term, completes the proof 
of inequality (d) in (|2]). 

Inequality (e): Bounding ||P7-±7£^*|| 

A standard argument about the norm of a matrix with i.i.d. entries [fT3ll and flU Proposition 3] give 



\Vr^lE*\\ < 7 {\\Vn,\n,E*\\ + \\Vn,E*\\) < 



32^Po{d + l)nlog 



n 



Under the assumption of Theorem [2| the right hand side is no larger than |. Therefore, inequality (e) in 

^ holds. 

This completes the proof of Theorem [21 As mentioned in section III-A Theorem [T] also follows. 



IV. Proof of Theorem |3] 

The proof is along the lines of that in [|4l and has three steps: (a) writing down a sufficient optimality 
condition, stated in terms of a dual certificate, for (P$(A*), B*) to be the optimum of the convex 
program ([T]), (b) constructing a particular candidate dual certificate, and, (c) showing that under the 
imposed conditions this candidate does indeed certify that (P$(A*), B*) is the optimum. Part (b) is the 
"art" in this method; different ways to devise dual certificates can yield different sufficient conditions for 
exact recovery. Indeed this is the main difference between this paper and [[H. 

1 ) Optimality conditions: For the sake of completeness, we restate here a first-order sufficient condition 
that guarantees (P$(A*), B*) to be the optimum of ([T]). The reader is referred to [|41 for a proof. 

Lemma 5 (A Sufficient Optimality Condition H). The pair {Vq,{A*), B*) is the unique optimal solution 
of^if 

(a) r^nr = {o}. 

(b) There exists a dual matrix Q G ]R"i^"2 satisfying V^c(Q) = and 

VriQ) = UV^ \\Vr4Q)\\<^ ^20) 

Vr4Q) = l^MA*)) ||Pr(Q)||oo<7- 

Lemma [5] provides a first-order sufficient condition for {V^{A*), B*) to be the optimum of ([1]). Condition 
(a) in the lemma guarantees that the sparse matrices and low-rank matrices can be distinguished without 
ambiguity. In other words, any given matrix can not be both sparse and low-rank except the zero matrix. 
The following lemma gives a sufficient guarantee for the condition (a). We construct the dual matrix Q 
in the next subsection and prove condition (b) afterwards. 



Lemma 6. If a < 1, then n T = {0}. 

Proof: It is clear that {0} G F"^ fl T. In order to obtain a contradiction assume that there exists 
a non-zero matrix M E V CiT. By idempotency of orthogonal projections, we have M = Vy-c{M) = 
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Vt{Vtc{M)) and hence 

\\Vr{Vr^{M))\\^ 

= \\UU^VrciM) +VrciM)VV^ - UU^Vr4M)VV^ \\^ 
<\\UU'^Vrc{M)\\ +\\Vrc{M)VV^\\ + \\UU^Vr4M)VV^ \ 



< max||f/f/^ei||max||Pr<^(M)ej-|| + max ||e7'Pr-(M)|| max ||\/\/^e,;|| 
+ max||e|f/f/^|| ||Pr-(M)|| max llyy^eJI 



3 



< max||f/f/^eJ| v^||Pr^(M)||^ + v^||Pr=(M)||^max||l^V^^e,| 

+ max ||f/f/^ei|| d ||'Pr-(M)||^ max ||\/\/^ei|| 

< a||Pr=(M)|U = a\\Vr (rrciM)) \\^. 



Here, we used the fact that ^/^-i < — ^ — r since both terms do not exceed 1 by assumption. 

y ni y n2 — y max(ni,n2) •' ^ 

Hence, ||M||oo = or equivalently, M = 0. This is a contradiction. 

■ 

2) Dual Certificate: We now describe our main innovation, a new way to construct the candidate dual 
certificate Q, which is different from the ones in [|4l. We construct Q as the minimum norm solution to 
the equality constraints in Lemma |5j As a first step, consider two matrices Qa and Qb defined as follows: 
with M* = 7sgn(A*) and N* = UV*, let 

=M*-VriM*)+Vrc (Vr{M*)) -Vr (Vrc (Vr{M*))) + --- 
Qf,=N*- Vt^ {N*)+Vt (Vr^ {N*))-Vr^ (Vr iVrc {N* ))) + ■■■ 

Lemma [7] below establishes that Qa and Qb as described above are well-defined, i.e., it establishes that 
the infinite summations converge, under the conditions of the theorem. Note that when this is the case, 
we have that 

Vr m = UV^ Vr {Qa) = 

Vr4Qa) = lV^{sgn{A*)) Vv^{Qb) = 0. 



From ([22]), it is clear that Q = Qa + Qb satisfies the equality conditions in ([20)) and also V^c[Q) = 0. In 
the next subsection, we will show that the inequality conditions are also satisfied under the assumptions 
of the theorem [3l 

Lemma 1 . If a < 1, then Qa and Qb exist, i.e., the sums converge. 

Proof: For any matrix W E M"!^"^^ let Sw = W + Vr (Vr^W)) + Vr {Vv- {Vr {Vt-{W)))) + ■■■. 
It suffices to show that S^^ converges for all W since Qa = M* — Vr (Sp^(Af)) and Qb = SN'-VrciN*)- 



Notice that \\Vr {Vr^ (W)) ||oo < «||^r<=(Vr)||oo < tt||W^||oo as shown in ([21]) and hence Sw geometrically 
converges. 

■ 

3) Certification: Considering Q = Qa + as a candidate for dual matrix, we need to show the 



conditions in ( [20| ) are satisfied under the conditions of the theorem. As we showed in the previous 
subsection, the equality conditions are satisfied by construction of Qa and Qb. To prove the inequality 
conditions, we first bound the projection of Q into orthogonal complement spaces in next lemma. 

Lemma 8. If a < 1, then 



l^r(Q)||oo<^(./^ + «7 
1 — a V V ^1^2 



rr.(Q)ll<T^(,/^ + 7 

1 — a \ V ^1^-2 
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Proof: Using the definition of S^y for any matrix W G ]R"i^"^ we get ||Svi/|loo ^ iz^ 11^ I 
because of the geometrical convergence. Thus, we have 

||^r(Q)||oo = \\Pt [SN'-VriM*)) ||oo 

< ||SAr*_p^(M*)||oo 

< -^\\N* -Vr{M*)\\^ 
1 — a 

< mioo + \\VriM*)\U 
1 — a 

< ^— (||iV*|U + a||M*|U) 



1 




a 




1 




1 




a 




1 




1 




a 




1 




1 




a 



< 1 a/— + "7 

In the last inequality we use the incoherence assumptions for sparse and low-rank matrix. By orthonor- 
mality of U and V, we have ||I - UU^\\ < 1 and ||I - VV^\\ < 1. Hence, 

\\Vr4Q)\\ 

= \\Vr± (M* - Pre (S^*„p^(M.))) || 

= II (I - UU^) (M* - Pre (S^.„p^(M'))) (I - VV^) II 

< ||M* — Vr'= (SAr._p^(Af)) II 

<r]d\\M*-Vrc {SN*-rr{M')) lU 

< rid (||M*||oo + ||SAr._p^(M*)||oo) 

V 1 - a V V '^i''^2 



<^(,/^ + 7|. 

1 — a \ y nin2 

Here, again we are using the incoherence assumptions on the sparse and low-rank matrix. This concludes 
the proof of the lemma. 



Finally to satisfy (20), we require 



1 — a \ Y nin2 



\\Vrm\oo < T^L-^ + ai] <7 



1 — a \ V nin2 

Combining these two inequalities, we get 



1 / ur 1 — a I ur 

< 7 < 



1 — 2a; Y nin2 rjd y nin2 

as stated in the assumptions of the theorem. 

V. Experiments 

In this section, we illustrate the power of our method via some simulation results. These results show that 
the behavior of the algorithm agrees with the theoretical results. 
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Fig. 1. For a rank two matrix of size n, with probability of corruption t — 0.1 and no adversarial noise (d = 0), we plot the minimum 
probability of observation po required for successful recovery of the low-rank matrix as n gets larger. 

We investigate how the algorithm performs as the size of the low-rank matrix gets larger. In other words, 
we try to see how the requirements for the success of our algorithm change as the size of the matrix 
grows. These simulation results show that the conditions get relaxed more and more as n increases. We 
run three experiments as follows: 

(1) Minimum Required Observation Probability: We generate a rank two matrix (r = 2) of size n by 
multiplying a random nx2 matrix and a random 2xn matrix, and then corrupt the entries randomly 
with probability r = 0.1 without any adversarial noise {d = 0). The entries of the corrupted matrix are 
observed independently with probability pq. We then solve ([T]) using the method in [14]. Success is 
declared if we recover the low-rank matrix with a relative error less than 10^^ measured in Frobenius 
norm. The experiment is repeated 10 times and we count the frequency of success. For any fixed 
number n, if we start from po = 1 and decrease pq, at some point, the frequency of success jumps 
from one to zero, i.e., we observe a phase transition. In Fig. [T| we plot the po at which the phase 
transition happens versus the size of the matrix. This experiment shows that the phase transition po 
goes to zero as n increases as predicted by the theorem. 

(2) Maximum Tolerable Corruption Probability: Similarly as before, we generate a rank two matrix 
(r = 2) of size n, with observation probability pq = 0.9 and without any adversarial noise (d = 0). 
For any fixed number n, if we start from r = and increase r, at some point, the frequency of 
success jumps from one to zero. Fig. [2] illustrates how the phase transition r changes as the size of 
the matrix increases. This experiment shows that higher probability of corruptions can be tolerated 
as the size of the matrix increases as predicted by the theorem. 

(3) Maximum Tolerable Adversarial/Deterministic Noise: Similarly as before, we generate a rank two 
matrix (r = 2), of size n, with observation probability po = 0.5 and corruption probability r = 0.1. 
We add the adversarial noise in the form of a dx d block of I's lying on the diagonal of the original 
matrix. Notice that potentially it is a hard case to recover the low-rank matrix since all the adversarial 
corruptions are burst as oppose to be spread over the matrix (Bernoulli corruptions). We find the 
maximum possible d such that the frequency of success to goes from 1 to (phase transition). In 
Fig. |3| we plot this phase transition d versus the size of the matrix and as the deterministic theorem 
predicts, it grows linearly in n. 
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Fig. 2. For a rank two matrix of size n, with probability of observation po = 0.9 and no adversarial noise (d — 0), we plot the maximum 
probability of corruption r tolerable for successful recovery of the low-rank matrix as n gets larger. 
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Fig. 3. For a rank two matrix of size n, with probability of observation po = 0.5 and probability of corruption r — 0.1, and with 
adversarial/deterministic noise in the form of a d x d block of I's lying on the diagonal of the matrix, we plot the maximum size of the 
adversarial noise d tolerable for successful recovery of the low-rank matrix as n gets larger. 
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Appendix 



Here we provide several technical lemmas that is needed in the proof of the unified guarantees. We 
first state the non-commutative Bernstein inequality, which is useful in the sequel. The version presented 
below is first proved in f|T2|. [|9l and later sharpened in [15]. 



Lemma 9. / [75l Remark 6.3] Consider a finite sequence {Zk} of independent, random rii x n2 matrices that 

satisfy the assumption KZk = and \\Zk\\ < D almost surely. Let = max { ||X]fc \_^kZj]^ || , ||Xlfc \_^k || }• 

Then for all t > we have 



P 



Zk > t 



< 



< 



{ni + n2) exp 



2a2 



iDt 



W.L.O.G. we only consider the case rii 



(ni + n2)exp (^-fl) , fort<^. 
{ni + n2)exp{-^) , fort>^. 

n2 = n. Recall that we have defined a 



(23) 
(24) 

+ 



fird 



3W^. Under the assumptions of Theorem 



a is a sufficiently small constant bounded 



^r(e.e7)||;< 



Vi, j, which follow from 



max{ni,7i2} 

away from 1. We will make use of the following estimates 
the incoherence assumptions of U and V . 

We start with the proof of Lemma [2] We need one simple lemma for the deterministic set F^. 

Lemma 10. For any matrix Z eT, we have 



Proof Since Z eT, Z = UX^ 
B* gives 



\\Vr<,{Z)\\^<a\\Z\\p 
U^YV^ for some X,Y e 



". For 1 < j < n, incoherence of 



max\eJUX~^ ej\ < 




J|l2 



Therefore, we have 



It follows that 



\Vr^{UX^ 



< 



j 



J2\\Vr^,iUX^)e 

j 

^2 11 vT||2 

= " 11-^ \\f 



J|l2 



2 

J|l2 



Similarly, we have ||Prj(t^^V" )||^ < \\Y\\p. The lemma then follows from the triangular inequality 
and ||Z||^ = ||X||^ + ||V||^. ■ 
We now turn to the proof of Lemma [2} In fact, we will prove a slightly more general result as below. 

Lemma 11. Suppose Qq is a set of indices obeying Qq ^Ber{p), and Fq is a fixed set of indices. 
1 ) For any (3 > 1, we have 



with probability at least 1 - 2^2"^^ provided 1 > ei > y^^^f^^^- 
2) If in addition, Fq = Fj, where F^ satisfies the assumptions in Theorem |2] then 

Wp^^VrVnonr^Vr - Vr\\ < ei + a 
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with the same probability. 

Proof: We will use Lemmalolto bound the operator norm of the random component p^^'Pr^nonro^T" 



'Pt'Pto'Pt- To this end, we need to write the random component as a sum of zero-mean, independent 
random variables, and then show that each of them is bounded almost surely and their sum has small second 
moment. Now for the details. For E Fq, define the indicator random variables 5ij = l{(i,j)enonro}; 
so 5ij equals one with probability p and zero otherwise, and is independent of all others. For any Z E T, 
observe that Zij = (cjej, for G Fq, and thus 

p-^VrVnonToVrZ - VrVr.VrZ 

(ij)6ro 
(ij)6ro 

Here Sij : M"^" i— )■ M"^" is a self-adjoint random operator with E [Sij 
Bernstein inequality, we need to bound \\Sij\\, and E 



sup II {p'^5ij - l) (VrieieJ), Z) VriciC 

2 fir 



= 0. To use the non-commutative 
. To this end, we have 



< sup p \\Vr{eie^^ "'^ 



np 



On the other hand, for any Z e T v^/e have Sfj{Z) = {p ^5ij — l)'^ (^ZijVr{eiej)~^ , eieJ)Vr{eieJ)- 
Therefore 



E 



(ij)Gro 



E W^rieiej^WpZ^jPrieie 

Yl W'PrieiejVWlz.jieieJ 
2pr 



n 



(*,i)6ro 



(p--l)^rro(^)ll.<(p-- 1)^11^11., 



which means 
and obtain 



E 



E^- i^^v Sf}\ < ^. When ei > max | 



32j3^r log n 32/3/ir log n 
3np 



P 



3np 



we apply Lemma 



Therefore, \\p ^VrVnonVoVr — VtVtqVtW < ei w.h.p., which proves the first part of the lemma. On the 



other hand, when Fq = Fj, Lemma 10 gives 



WVtVt.Vt - VrW 



= ^'^f^_WrVT,VT-Vr)Z\\j, 
< max a \\VrZ\\p < a. 
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The second part of lemma then follows from the triangular inequality. ■ 
The next three lemmas bound the norms of certain random matrices. Their proofs follow the same spirit 
as Lemma [TT] by decomposing the random component into the sum of independent, bounded variables 
with small second moments, and then invoking Lemma |9j The following lemma is a generalization of tfl 
Theorem 6.3]. 

Lemma 12. Suppose VLq is a set of indices obeying VLq r^Ber{p), Vq is a fixed set of indices, and Z is a 
fixed n X n matrix. 
1 ) For any (3 > 1, we have 



< 



I (in \ogn 



3p 



\Vv,Z\ 



with probability at least 1 — ^ provided p > ^^3°^" - 
2) If in addition, Tq = Yd, where satisfies the assumptions in Theorem^ we have 



VnoZ - Z 



< 



l(5n log 72 



3p 



+ d\ \\Z\ 



with the same probability. 

Proof: For G Fq define the random variable 5ij = l{(ij)eno}- Notice that 



P 



{i,i)Gro 



Here Eij G M"^" satisfies E [Eij] = 0, \\Eij\\ < UPro^lL and 



E 












(i,i)ero 






(ij)6ro 



diag J2 ^L'---' Yl ^l. 



< {p-'-l)n\\Vr,Z\\l<p-'n\\Vr,Z 



(i,i)ero in,j)eTo 

00 ■ 



A similar calculation yields 
When p > we app 



E 



<p-'n\\Vr,Z\ 



y Lemma |9| and obtain 

P 



(*,i)6ro 



3p 



< 2nexp — - 



8/3n log n ii^ 71 

3 3p ll'ro^l 



Therefore, 



On the other hand, when Fq 



< 



3n log n 



3p 



l^ro^lloo w.h.p., which proves the first part of the lemma. 



'd, H Proposition 3] gives 



\Vr,Z-Z\ 



\Vt-Z\\ < d\\Vr-Z\\ 

I d M M d I I OO 



The second part of the lemma then follows from the triangle inequality. 
The following lemma is a generalization of [|7l Lemma 3.1]. 
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Lemma 13. Suppose Qq is <^ ^st of indices obeying VLq r^Ber(j)), Tq is a fixed set of indices, and Z is a 
fixed n X n matrix in T. 
1 ) For any /3 > 1 and 63 < 1, we have 

^VrVnonroVrZ - VrVr.VrZ 
with probability at least 1 - 2n'^-'^>^ provided p > 



2) If in addition, Tq = Tj, where satisfies the assumptions in Theorem |2] we have 

1 



-VrVnoVrZ - Z 



P 



<(e3 + «) II^IL 



with the same probability. 

Proof: For E Fq, set 5ij = Fix {a,b) E [n] x [n]. Notice that 

1 



VrVnonroVTZ - VtVt.VtZ 

P / a,b 

{i,i)ero (i,i)Gro 
where E [^ij] = 0. For (z, j) E Fq, we have 

\^^J\ < p-'\\Vrie,e])\\^\\Pr{eaeJ)\\^\Z,,\ 



np 



The second moment is bounded by 



E 



E 4 



(ij)ero 



J2 E [{p-'6,, - 1)2] (Vrie^eJ), e^ajy Z, 
{i,i)ero 

< ip-' - 1) rro^IlL E (^^4' ^r(eae:)>' 



(*,i)ero 

2 11^ ^ ^„ „TM|2 



{p-'-l) ||^ro^L||^r„Pr(eae,;||^ 



< (p-^-i)^rro^iiL<^ii^ro^iiL- 



When p > ^2^3^^^|g" and eg < 1, we apply Lemma |9| and obtain 



P 



-VrVnonroVrZ - VrVr.VrZ 

P / a,b 



>^3\\VroZl 



^■'-§\\VroZ\\ 



2 

00 . 



Union bound then yields 



-VrVnonroVrZ - VtVt.VtZ 



< \\Vv,Zl 
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with high probability, which proves the first part of the lemma. On the other hand, when Fq = Fd, by pT] ) 
we have HVrVr^VrZ — Z\\^ = UPT-Prg^H < o; ll^lloo- "^^^ second part of the lemma then follows 
from triangle inequality. ■ 
The next two lemmas bound \\VrE*\\^. 

II ' II oo 

Lemma 14. Under the assumption of Theorem |2] we have 

\\VTVn,E*\\oo<^- 

Proof: By assumption f2d contains at most d entries from each row/column, so repeating the proof 
of Lemma [6] yields the desired bound. ■ 

Lemma 15. Under the assumption of Theorem |2] and conditioned on Vtr, we have 



with high probability for some constant C > 0. 

Proof: Set E = Vn^.\Q^E*; observe that each entry of E in (firn$d)\^d is non-zero with probability 
Po and has random sign, independent of each other. Since we have 

llPr^ll^ = \\VuE + PyE - PuVvEW^ 
< \\UU^E\\ +\\EVV^\\ +\\UU^EVV^\\ , 

— II lloo II lloo II lloo' 

it suffices to bound these three terms. From the incoherence property of U, we know 



lUU^W =max\eJUU^ei\ < 



and 



\eJuU^\\'<^, Vz 
I II ^ 



Now we bound Ht/^/^-EH^. For simplicity, we focus on the (1, 1) entry of (UU^ E^ and denote it as X. 

Set s"^ = eJUU^. Observe that X = Z]i:(i,i)e(an<i>d)\!^d ^'^^^'^ ' ^^^^ ^ [siEiA] = and 



I y I fxr 
\Si Ei^i\ < \si\ < — , a.s. 



n 

Var(X) = V ^Po. 



n 

i:(j,i)e(an'i'd)\f^d 



Standard bernstein inequality p4| ) thus gives 

P[|X| > t] < 2exp 



Under the assumption of Theorem |2| we can choose t = Cmaxj^logn, \^^Po logn} for some C 
sufficiently large and apply the union bound to obtain 



< C max < — logn, W — poXogn} , w.h.p. 

II lloo y Y ^ J 

Similarly, HE'V^V^^H^ is also bounded by the right hand side of the above equation. Finally, denote 
w := VV^ Ci and observe that 

[UU^EVV^)^^^ = Yl 

(i,i)e(an'i>d)\f^d 
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Then a similar application of Bernstein inequality and the union bound gives 

WUU^EVV^W < C max \ logn, —Jpo logni , w.h.p. 

M Moo 1^ 72"' n J 

The lemma follows from observing that ^ < 1 and po > under the assumptions of Theorem 2j 

Finally, we prove Lemma |4} 

Proof: (of Lemma Q Recall that by definition = {j^l^ Q'-'^l so we have 



1 



where is a matrix with independent random signed entries supported on $ fl ^7*^^^ 



The operator norm of the first term is bounded using [4, Proposition 3] and Lemma 15 



as 



Let 6, 



ab 



-{{a,6)en(i)} 



d\\\VTVn.\n,m\\^ <d-CX^^Po\ogn < C. 
1)1 , then the second term can be decomposed as 



A(-P^a)-X)Pr,Pr^f^a)r^ 



A \ —6ab - 1 ) (1 - 5a'b') Ea',b' (Prj^T (Ca'C^) , Cacj) CaC, 



= E + E 

{a',b')={a,b) {a' ,b')y^{a,b) 

We bound the operator norm of the above two terms separately. 
The diagonal term is bounded as 



E 

{a',b')=a,b 



A {5ab - 1) Ea,b {'Prj'PrieaeJ),eaeJ) e^e^ 



a.b 



< 



A (Sab - qi) Xa^b^ael 



a.b 



1 



A ( -Vnii) -XIX 



a,b(^a(^ 

a,b 



where Xa,b = (Vr^VrieaeJ),eaeJ). The first part of Lemma 12 with ^Iq = il'-^^ and Tq = [n] x [ 



n 



bounds the first term by qiXC^J^^ II^IL < ^lACy < C". We then apply [2, Lemma 6.4] and 

a standard bound of the operator norm of a random matrix to bound the second term by A(l — gi) — Ili^H < 
X'-^Wnpo\ogn<C. 
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The off-diagonal term can be expressed as 



(a',b')^{a,b) {a',b')y^(a,b) ^ 

{^a'b> - qi) (gi - Sab) Eab (Prj^T (CaC^ ) , Ca'cJ,) Ca'cJ, 



{a' ,b')j^{a,b) 
A 



(Sa'b' - qi) (1 - qi) Eab {Vr,Vr{eaeJ), ea'cj,) Ca'cJ, 



qi 

^ {a' ,b')y^{a,b) 



The operator norm of first term can be bounded using the decoupling argument in 0. In particular, we 
can repeat the proof of [2, Lemma 6.7] with p = qi, ^ab = ^ab — qi and ||'Prd'^r('3aef[) ||^ < ^ to bound 
the first term by C'X^\ogn\\E\\^ < C. Let Ha'y = J2aM^,bWa',b') Ea,b {Vr,Vr{eaeJ),ea'el); the 
second term can be bounded as 



A(l-gi) 



qi 



(Sa'b' - qi) Ea,b {Vr.VrieaeJ), Ca'cJ,) e^'C 



{a',b'ma,b) 



< X 



X 



a',b' ^ (a,fe):(a,6)^{a',6') 



qi 



< AC, 



' n log n 

qi 



\H\ 



(25) 



where we use the first part of Lemma 12 with VLq = Vl^^^ and Fq = [n] x [n] in the inequality. Further 
observe that 



Ha',b' 



SO we have 



Ea,b {Vr,Vr{eaeJ), Ca'cJ,) j - Ea'^y {Vr,Vr{ea'eJ,), Ca'cJ,) 

\ a,b / 

{Vt.VtE)^, ,, - Ea>,b' {Vr,Vr{ea'el),ea'eJ,) 



mL < ll^ri^^lloo + ll^lloo||^r(e.'eJ)||^ 



< — poiogn^ — Pologn. 

n n V n 



where we use Lemma 



15 



C" . This completes the proof of the lemma- 



It follows that the right hand side of (25 1 is bounded by A J^^C'^^Po logn < 



